The data set contains 113,937 loans with 81 variables on each loan. I choose 10 varibles frome this data set to explore. The variables I choose are: loan original date, term, loan status, borrower rate, listing category, occupation, employment, monthly loan payment, loan original amount and income range.
## EmploymentStatus Term ListingCategory..numeric. Occupation
## 1 Self-employed 36 0 Other
## 2 Employed 36 2 Professional
## 3 Not available 36 0 Other
## 4 Employed 36 16 Skilled Labor
## 5 Employed 36 2 Executive
## 6 Employed 60 1 Professional
## MonthlyLoanPayment LoanOriginalAmount IncomeRange CreditGrade
## 1 330.43 9425 $25,000-49,999 C
## 2 318.93 10000 $50,000-74,999
## 3 123.32 3001 Not displayed HR
## 4 321.45 10000 $25,000-49,999
## 5 563.97 15000 $100,000+
## 6 342.37 15000 $100,000+
## BorrowerState LoanStatus BorrowerRate LoanOriginationDate year
## 1 CO Completed 0.1580 2007-09-12 00:00:00 2007
## 2 CO Current 0.0920 2014-03-03 00:00:00 2014
## 3 GA Completed 0.2750 2007-01-17 00:00:00 2007
## 4 GA Current 0.0974 2012-11-01 00:00:00 2012
## 5 MN Current 0.2085 2013-09-20 00:00:00 2013
## 6 NM Current 0.1314 2013-12-24 00:00:00 2013
From this plot we can see that from 2005 to 2008, the number of loans increase. but at 2009, the number of loans suddenly decrease a lot. Then from 2009, the number of loans increase quiet fast and reach the highest at 2013. The dataset only contains data before 3/11/2014, so it is not surprise that the number of loans in 2014 is quiet small.
From this plot we can see that the most people choose to loan for 36 months.
The reason why people loan.
From this plot we can see that most people loan because of debt consolidation.
## Cancelled Chargedoff Completed
## 5 11992 38074
## Current Defaulted FinalPaymentInProgress
## 56576 5018 205
## Past Due (>120 days) Past Due (1-15 days) Past Due (16-30 days)
## 16 806 265
## Past Due (31-60 days) Past Due (61-90 days) Past Due (91-120 days)
## 363 313 304
Most of the loans are completed or in current. Few of loans past due.
Form this plot we can find that the number of loans in CA is significantly large. The number of loans in FL,IL,NY,TX is also very large.
From this plot we can find that most people loan has income range between 25,000 to 74,999.
Most people’s loan original amount is below 27,000.
Most of the credit grades are leave blank, we need to ignore the blank credit grades when making the Bivariate and mulivariate plot.
Most of the borrower rate is between 0.05-0.35 which is a really large rate. The distribution of the rate is like normal distribution.
The data set contains 113,937 loans with 81 variables on each loan. I choose 12 varibles frome this data set to explore. The variables I choose are: loan original date, term, loan status, borrower rate, listing category, occupation, employment, monthly loan payment, loan original amount,income range, credit grade and borrower state.
term:The length of the loan expressed in months.(12,36,60) loan status:Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue listing category:The category of the listing that the borrower selected when posting their listing(including: Not Available, Debt Consolidation, Home Improvement, Business, Personal Loan, Student Use, Auto, Other, Baby&Adoption, Boat, Cosmetic Procedure, Engagement Ring, Green Loans, Household Expenses, Large Purchases, Medical/Dental, Motorcycle, RV, Taxes, Vacation, Wedding Loans
I am interest in what factor influence the rant of loans? And what factor is the most important? The factors I’m interested include: 1. when 2. where 3.term of loan 4.occupation/employment status of borrower 5. loan amount 6. Incomerange of borrower7. credit grade of borrower
I also explore the reason why people loan.
I create year to indicate the year when te loan start.
I change the Listing Category of loan to words instead of the number to make it easier for people to know what is going.
From the plot below we can see that the average borrower rate is different year to year. And there are only few outliers. The average borrower rate is quite small at 2015. From 2006 to 2008, the average borrower rate decrease a little bit. From 2008-2011, the borrower rate increase. And from 2011 to 2014, the average borrower rate decrease.
From this plot, we can find that the longer the term is, higher rate is likely.
The difference of loan rate to different employment status is not much. However, the rate for borrower that is not employed is higher than other status.
From the scatter plot I find the rate variance decrease as the loan original amount increase.But before 25,000 the loan original amount seems has litter relation to the loan original amount. For the loan over 25,000 dollars, the loan rate variance is apparently smaller.
The blue line is the correlation line of borrower rate to loan original amount. From this line, we can find that the larger the loan original amount is, the lower borrowerrate might be. #### LoanAmount-rate
The borrower rate has little relation to the borrower state.
For borrower rate is lower when the income range of the borrower is increase. But surprisely, the income range of 0 is lowest. There maybe some special loan project for these people.
Apparently, the higher the credit grade, the lower rate is.
Some factors I think may have impact on the loan rate actually has little impact on that. The factors I find that have relatively strong impact on loan rate are year of loan, term, income range and credit grade.
The loan original amount, the borrower state and the employment status has little relation to the loan rate.
The time of loan is the most important factor of borrower rate.
From the plot below we can see that before 2009, the term of loan is always 36 months. After 2009, there are loans for 12, 26 and 60 months. When 12 and 36 months’ loan first appear(at 2010), the average rate is quiet low accordint to the rate for 36 months’ loan. But it increase quiet fast.
After 2009, no credit grade is available. But we can tell from 2005-2009, in general, the higher the credit grade is, the lower the average borrower rate.
The plot below descibe the relation of the average borrower rate in 2005-2014 for people have different income range. I omit the ‘Not displayed’ category of the income range. From the below plot we can see that in genaral the higher the income range, thelower the average borrower rate is. And the basic shape of the average borrower rate to the year is the same.
The tendency of average loan rate is nearly the same for different terms, different Incomerange and credit grade.
At 2012-2014, the average loan rate decrease, but the average rate for unemployed people increase a lot.
## 0 1 2 3 4 5 6 7 8 9 10 11
## 16965 58308 7433 7189 2395 756 2572 10494 199 85 91 217
## 12 13 14 15 16 17 18 19 20
## 59 1996 876 1522 304 52 885 768 771
This plot describes number of loans for different categories. From this plot we can see that the most people loan for Debt Consolidation.
This plot shows the loan rate changes in 2005-2014. The red line is the average loan rate for that time. We can see from the plot that the loan rate increase in 2005-2011, and decrease from 2011 to 2014.
This plot descibe the relation of the average borrower rate in 2005-2014 for people have different income range. I remove the ‘Not displayed’ category of the income range because it doesn’t contain useful information. From the below plot we can see that in genaral the higher the income range, thelower the average borrower rate is. And the basic shape of the average borrower rate to the year is the same.
The data set contains 113,937 loans with 81 variables on each loan. I choose 13 varibles frome this data set to explore its relation to the loan rate. The variables I choose are: loan original date, term, loan status, borrower rate, listing category, occupation, employment, monthly loan payment, loan original amount, income range and borrower state.
I started by understanding the individual variables in the data set, and then I explored the relation of each factor to the loan rate. Then I found the factors which have may relate relatively strong to the loan rate and did some further explorement in these factors.
From the explorement, I find that the relatively strong factors of loan rate are date of loan, credit grade, income range of borrower and term. What surprise me is that the loan amount has little relation to the loan rate.
When doing this project, I find it is hard because many data is blank. For example, the credit grade is only available in 2005-2009 and the many employment status is not displayed. So I have to omit the this data when analyze the data.
I also find that the loan rate is affected by too many features that I can’t describe them all in one plot. So I need to split the features and find out the most important features or the features I am interested in and then display them in different plots.
In this project, we know that the time, credit grade, income range and term has relatively strong relationships with the loan rate. In the further work, maybe I can give each feature a coefficient which can makes us to predict the loan rate.